Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Implement LoadVector64x*AndUnzip and LoadVector128x*AndUnzip APIs #94128

Merged
merged 16 commits into from
Oct 30, 2023

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Oct 28, 2023

Adds LoadVector64xAndUnzip and LoadVector128xAndUnzip APIs

    // LD1 (multiple structures) 2 register variant
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2) LoadVector64x2AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2) LoadVector64x2AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2) LoadVector64x2AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2) LoadVector64x2AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2) LoadVector64x2AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2) LoadVector64x2AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2) LoadVector64x2AndUnzip(float*  address);

    // LD1 (multiple structures) 3 register variant
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3) LoadVector64x3AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3) LoadVector64x3AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3) LoadVector64x3AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3) LoadVector64x3AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3) LoadVector64x3AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3) LoadVector64x3AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3) LoadVector64x3AndUnzip(float*  address);
    
    // LD1 (multiple structures) 4 register variant            
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3, Vector64<byte>   Value4) LoadVector64x4AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3, Vector64<sbyte>  Value4) LoadVector64x4AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3, Vector64<short>  Value4) LoadVector64x4AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3, Vector64<ushort> Value4) LoadVector64x4AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3, Vector64<int>    Value4) LoadVector64x4AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3, Vector64<uint>   Value4) LoadVector64x4AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3, Vector64<float>  Value4) LoadVector64x4AndUnzip(float*  address);

    // LD1 (multiple structures) 2 register variant
    public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2) LoadVector128x2AndUnzip(byte*   address);
    public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2) LoadVector128x2AndUnzip(sbyte*  address);
    public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2) LoadVector128x2AndUnzip(short*  address);
    public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2) LoadVector128x2AndUnzip(ushort* address);
    public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2) LoadVector128x2AndUnzip(int*    address);
    public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2) LoadVector128x2AndUnzip(uint*   address);
    public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2) LoadVector128x2AndUnzip(long*   address);
    public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2) LoadVector128x2AndUnzip(ulong*  address);
    public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2) LoadVector128x2AndUnzip(float*  address);
    public static unsafe (Vector128<double> Value1, Vector128<double> Value2) LoadVector128x2AndUnzip(double* address);

    // LD1 (multiple structures) 3 register variant
    public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3) LoadVector128x3AndUnzip(byte*   address);
    public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3) LoadVector128x3AndUnzip(sbyte*  address);
    public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3) LoadVector128x3AndUnzip(short*  address);
    public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3) LoadVector128x3AndUnzip(ushort* address);
    public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3) LoadVector128x3AndUnzip(int*    address);
    public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3) LoadVector128x3AndUnzip(uint*   address);
    public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3) LoadVector128x3AndUnzip(long*   address);
    public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3) LoadVector128x3AndUnzip(ulong*  address);
    public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3) LoadVector128x3AndUnzip(float*  address);
    public static unsafe (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3) LoadVector128x3AndUnzip(double* address);
    
    // LD1 (multiple structures) 4 register variant            
    public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3, Vector128<byte>   Value4) LoadVector128x4AndUnzip(byte*   address);
    public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3, Vector128<sbyte>  Value4) LoadVector128x4AndUnzip(sbyte*  address);
    public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3, Vector128<short>  Value4) LoadVector128x4AndUnzip(short*  address);
    public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3, Vector128<ushort> Value4) LoadVector128x4AndUnzip(ushort* address);
    public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3, Vector128<int>    Value4) LoadVector128x4AndUnzip(int*    address);
    public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3, Vector128<uint>   Value4) LoadVector128x4AndUnzip(uint*   address);
    public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3, Vector128<long>   Value4) LoadVector128x4AndUnzip(long*   address);
    public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3, Vector128<ulong>  Value4) LoadVector128x4AndUnzip(ulong*  address);
    public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3, Vector128<float>  Value4) LoadVector128x4AndUnzip(float*  address);
    public static unsafe (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3, Vector128<double> Value4) LoadVector128x4AndUnzip(double* address);

Contributes to #84510

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned TIHan Oct 28, 2023
@ghost
Copy link

ghost commented Oct 28, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds LoadVector64xAndUnzip and LoadVector128xAndUnzip APIs

    // LD1 (multiple structures) 2 register variant
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2) LoadVector64x2AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2) LoadVector64x2AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2) LoadVector64x2AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2) LoadVector64x2AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2) LoadVector64x2AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2) LoadVector64x2AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2) LoadVector64x2AndUnzip(float*  address);

    // LD1 (multiple structures) 3 register variant
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3) LoadVector64x3AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3) LoadVector64x3AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3) LoadVector64x3AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3) LoadVector64x3AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3) LoadVector64x3AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3) LoadVector64x3AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3) LoadVector64x3AndUnzip(float*  address);
    
    // LD1 (multiple structures) 4 register variant            
    public static unsafe (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3, Vector64<byte>   Value4) LoadVector64x4AndUnzip(byte*   address);
    public static unsafe (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3, Vector64<sbyte>  Value4) LoadVector64x4AndUnzip(sbyte*  address);
    public static unsafe (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3, Vector64<short>  Value4) LoadVector64x4AndUnzip(short*  address);
    public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3, Vector64<ushort> Value4) LoadVector64x4AndUnzip(ushort* address);
    public static unsafe (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3, Vector64<int>    Value4) LoadVector64x4AndUnzip(int*    address);
    public static unsafe (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3, Vector64<uint>   Value4) LoadVector64x4AndUnzip(uint*   address);
    public static unsafe (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3, Vector64<float>  Value4) LoadVector64x4AndUnzip(float*  address);

        // LD1 (multiple structures) 2 register variant
        public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2) LoadVector128x2AndUnzip(byte*   address);
        public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2) LoadVector128x2AndUnzip(sbyte*  address);
        public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2) LoadVector128x2AndUnzip(short*  address);
        public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2) LoadVector128x2AndUnzip(ushort* address);
        public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2) LoadVector128x2AndUnzip(int*    address);
        public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2) LoadVector128x2AndUnzip(uint*   address);
        public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2) LoadVector128x2AndUnzip(long*   address);
        public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2) LoadVector128x2AndUnzip(ulong*  address);
        public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2) LoadVector128x2AndUnzip(float*  address);
        public static unsafe (Vector128<double> Value1, Vector128<double> Value2) LoadVector128x2AndUnzip(double* address);

        // LD1 (multiple structures) 3 register variant
        public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3) LoadVector128x3AndUnzip(byte*   address);
        public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3) LoadVector128x3AndUnzip(sbyte*  address);
        public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3) LoadVector128x3AndUnzip(short*  address);
        public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3) LoadVector128x3AndUnzip(ushort* address);
        public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3) LoadVector128x3AndUnzip(int*    address);
        public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3) LoadVector128x3AndUnzip(uint*   address);
        public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3) LoadVector128x3AndUnzip(long*   address);
        public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3) LoadVector128x3AndUnzip(ulong*  address);
        public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3) LoadVector128x3AndUnzip(float*  address);
        public static unsafe (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3) LoadVector128x3AndUnzip(double* address);
        
        // LD1 (multiple structures) 4 register variant            
        public static unsafe (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3, Vector128<byte>   Value4) LoadVector128x4AndUnzip(byte*   address);
        public static unsafe (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3, Vector128<sbyte>  Value4) LoadVector128x4AndUnzip(sbyte*  address);
        public static unsafe (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3, Vector128<short>  Value4) LoadVector128x4AndUnzip(short*  address);
        public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3, Vector128<ushort> Value4) LoadVector128x4AndUnzip(ushort* address);
        public static unsafe (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3, Vector128<int>    Value4) LoadVector128x4AndUnzip(int*    address);
        public static unsafe (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3, Vector128<uint>   Value4) LoadVector128x4AndUnzip(uint*   address);
        public static unsafe (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3, Vector128<long>   Value4) LoadVector128x4AndUnzip(long*   address);
        public static unsafe (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3, Vector128<ulong>  Value4) LoadVector128x4AndUnzip(ulong*  address);
        public static unsafe (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3, Vector128<float>  Value4) LoadVector128x4AndUnzip(float*  address);
        public static unsafe (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3, Vector128<double> Value4) LoadVector128x4AndUnzip(double* address);

Contributes to #84510

Author: TIHan
Assignees: -
Labels:

area-System.Runtime.Intrinsics, new-api-needs-documentation

Milestone: -

@TIHan
Copy link
Contributor Author

TIHan commented Oct 30, 2023

@dotnet/jit-contrib @kunalspathak this is ready

@TIHan TIHan marked this pull request as ready for review October 30, 2023 14:22
@@ -9034,6 +9184,111 @@ public new abstract class Arm64 : ArmBase.Arm64
/// </summary>
public static unsafe Vector128<ulong> LoadVector128(ulong* address) => LoadVector128(address);

/// <summary>
/// A64: LD1 { Vn.16B, Vn+1.16B }, [Xn]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding I think the noted register in this comment should be:

///   A64: LD1 { Vn.8B, Vn+1.8B }, [Xn]

as well as adjusting the other noted registers in the comments on the other LoadVector64x2/3/4* methods for sbyte,short,ushort,long,ulong,float.

///   A64: LD1 { Vn.4H, Vn+1.4H }, [Xn]
///   A64: LD1 { Vn.2S, Vn+1.2S }, [Xn]

https://developer.arm.com/documentation/102159/0400/Load-and-store---data-structures

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this should be half the sizes i.e. 8B, 4H, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it, yeah. I definitely might've made a mistake in documenting some of the things. Lots of APIs and lots of copy/pasting

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TIHan
Copy link
Contributor Author

TIHan commented Oct 30, 2023

@kunalspathak I'll make another PR to fix the comments for the LoadVector64x2/3/4, merging this now.

@TIHan TIHan merged commit 4b5756d into dotnet:main Oct 30, 2023
193 of 195 checks passed
@TIHan TIHan deleted the arm64-load-vector-and-unzip branch October 30, 2023 21:32
@kunalspathak
Copy link
Member

cc: @a74nh @SwapnilGaikwad

@ghost ghost locked as resolved and limited conversation to collaborators Nov 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants